In the spring of 2021, the Brood X cicadas emerged throughout much of the Mid-Atlantic region. Brood X are periodical cicadas that spend most of their lives underground, emerge in the spring every 17 years, breed, and then die. In their adult form, they produce an incredibly loud song to attract mates.
Our team at the Smithsonian deployed automated sound recording units (ARUs) throughout the District of Columbia, Maryland, and Virginia. Our goal for this study was to determine how bird song may vary in response to cicada noise along a gradient of anthropogenic noise and urban intensity.
In this exercise, you will explore cicada observations collected by iNaturalist community science participants. Early reports by iNaturalist observers helped us to determine the appropriate locations to place our ARUs.
The points allotted for each question are provided in highlighted red bold text (e.g., [1.0]) within the question itself. When applicable, total points for a question may represent the sum of individually graded components, which are provided in red text (e.g., [1.0]).
Points may be deducted from each question’s total:
eval = TRUE (but see below);Note: The maximum deduction is the total points value for a given question
In addition to points allotted per question, you must ensure that your R Markdown document runs out-of-the-box [25% off of the total grade] – in other words, the document will knit without error.
In this assignment, you may use only the following R
functions (Note: If you are unclear on what a given function does,
use ? to view the help file!):
:::()<-===!!=%in%%>%Note: The packages dplyr, ggplot2, magrittr, readr, rlang, and
tibble are all part of the tidyverse and are loaded with
library(tidyverse).
As always, please ensure that you are starting with a clean session!
1. [0.5] Save and knit this document:
problem_set_3_Evans_Brian.rmdAll of my scripts start with a setup section where I load libraries, set session options, conduct initial data loading, and early processing steps.
2. [0.25] Load the sf and
tidyverse libraries.
library(tidyverse)
library(sf)
3. [0.5] Set the theme of all of
the plots in this document to theme_void().
theme_set(theme_void())
4. [1.0] Read in and process the file
cicadas_brood_x_2021.csv. In doing so:
datetime, scientific_name, user,
longitude, and latitude;cicadas_temp.cicadas_temp <-
read_csv("data/raw/cicadas_brood_x_2021.csv") %>%
filter(quality_grade == "research") %>%
select(datetime:user,
longitude,
latitude)
The cicadas_temp file represents research-grade
iNaturalist observations of Brood X periodical cicadas from February
through September of 2021. These data contain the following fields:
datetime: The date and time of the observation,
formatted as an ISO 8601 (yyyy-mm-dd hh:mm:ss) datetime object and
reported in UTC;scientific_name: The genus and species of the observed
cicada;user: The user’s login name;place_guess: Best guess for the location of the
observation;longitude: The longitude of an observation, recorded in
EPSG 4326;latitude: The latitude of an observation, recorded in
EPSG 4326;image_url: The address of the image online;quality_grade: Whether the species identification can
be trusted, based on expert opinion – options include “casual”,
“needs_id”, and “research” (verified by experts);description: Notes that users may submit with their
observations.5. [1.25] Read and processes the file
counties.geojson (or
counties_low_res.geojson if you have memory issues!).
In doing so:
geoid, name, and state_name (in
that order);counties_temp.counties_temp <-
st_read("data/raw//shapefiles/counties_low_res.geojson") %>%
set_names(names(.) %>%
tolower()) %>%
select(geoid,
name,
state_name) %>%
st_transform(crs = 2283)
## Reading layer `counties_small' from data source
## `C:\Users\User\Documents\week4\data\raw\shapefiles\counties_low_res.geojson'
## using driver `GeoJSON'
## Simple feature collection with 3221 features and 12 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: -179.1743 ymin: 17.91377 xmax: 179.7739 ymax: 71.35256
## Geodetic CRS: NAD83
The counties_temp file is an sf
multipolygon shapefile where each feature represents a county in the
United States. These data contain the following fields:
geoid: A primary key associated with each shape;name: The county name;state_name: The name of the US state where the county
is located.6. [1.0] Brood X cicada are comprised of
three species, Magicicada cassinii, M. septendecim,
and M. septendecula. A large proportion of the observations
(about 60%) could not be identified to species (you can explore this
with group_by() and summarize()) and one of
the observations was of a 13-year periodical cicada, M.
tredecim (Note: This species is not a Brood X
cicada!).
cicadas_brood_x;cicadas_temp from your global environment (in a separate
code block).# cicadas_brood_x %>%
# group_by(scientific_name) %>%
# summarize(n = n())
cicadas_brood_x <-
cicadas_temp %>%
filter(
scientific_name %in%
c(
"Magicicada cassinii",
"Magicicada septendecim",
"Magicicada septendecula"
)
)
rm(cicadas_temp)
When working with spatial data, it’s often necessary to subset the spatial extent of your files to your study area. This will save a lot of processing time (and, potentially, help you avoid out-of-memory errors!).
7. [1.0] Subset counties_temp
to counties in the District of Columbia, Maryland, or Virginia [0.50], globally assign the resultant object
with the name counties [0.25], and in a separate code block, remove
counties_temp from your global environment [0.25].
counties <-
counties_temp %>%
filter(state_name %in% c("District of Columbia",
"Maryland",
"Virginia"))
rm(counties_temp)
8. [1.5] Convert
cicadas_brood_x to an sf point file [0.25] and subset to our area of interest. In
doing so:
counties;cicadas_sf;cicadas_brood_x from your global environment (in a separate
code block).cicadas_sf <-
cicadas_brood_x %>%
st_as_sf(coords = c('longitude', 'latitude'),
crs = 4326) %>%
st_transform(st_crs(counties)) %>%
st_join(counties) %>%
filter(!is.na(state_name))
rm(cicadas_brood_x)
In evaluating the distributions of Brood X for our experiment, we were interested in when each species of cicada emerged and where the most cicadas were observed.
9. [1.5] Generate a kable
[0.25] that displays the
datetime that each species of cicada [0.50] was first observed in
each state [0.50]. Please arrange the
table from earliest to latest datetime [0.50] (Note: The columns of the summary
table should be state_name, scientific_name,
and datetime).
cicadas_sf %>%
as_tibble() %>%
group_by(state_name,
scientific_name) %>%
summarize(datetime = min(datetime)) %>%
arrange(datetime) %>%
knitr::kable()
| state_name | scientific_name | datetime |
|---|---|---|
| District of Columbia | Magicicada septendecim | 2021-05-03 15:56:30 |
| Virginia | Magicicada septendecim | 2021-05-12 10:25:33 |
| Maryland | Magicicada septendecim | 2021-05-14 08:16:27 |
| Virginia | Magicicada cassinii | 2021-05-15 13:45:17 |
| Maryland | Magicicada cassinii | 2021-05-16 12:35:20 |
| Virginia | Magicicada septendecula | 2021-05-17 13:45:23 |
| Maryland | Magicicada septendecula | 2021-05-18 12:56:55 |
| District of Columbia | Magicicada cassinii | 2021-05-19 11:36:08 |
| District of Columbia | Magicicada septendecula | 2021-05-21 10:52:00 |
10. [1.5] Generate a choropleth map
of counties [0.50], where:
cicadas_sf %>%
as_tibble() %>%
group_by(name) %>%
summarize(n = n()) %>%
full_join(counties,
.,
by = 'name') %>%
ggplot() +
geom_sf(data = counties) +
geom_sf(aes(fill = n)) +
scale_fill_viridis_c(option = "plasma",
na.value = "#dcdcdc")